DSC 2003 Working
نویسنده
چکیده
Kriging is one of the most often used prediction methods in spatial data analysis. This paper examines which steps of the underlying algorithms can be performed in parallel on a PVM cluster. It will be shown, that some properties of the so called kriging equations can be used to improve the parallelized version of the algorithm. The implementation is based on R and PVM. An example will show the impact of different parameter settings and cluster configurations on the computing performance. 1 The classical form of kriging One of the aims of geostatistical analysis is the prediction of a variable of interest at unmeasured locations. The prediction method which is used most often is kriging. It is based on the concept of so called regionalized variables Z(x) with x ∈ D ⊂ R, d = 2, 3. Analysis usually starts with a spatial dataset consisting of measurements Z(xi), i ∈ I at a grid of observation points xi. These values are now treated as a realization of the underlying stochastical process Z(x) and modeled with a usual regression setup Z(x) = m(x) + ε(x),E(ε(x)) = 0. Of course it is not possible to make inference from only one realization. The idea of regionalized variables is now to partition the region D into nearly independent parts and to use them as different realizations. This is only valid if a stationarity assumption holds: m(x) = const, x ∈ D (1) DSC 2003 Working Papers 2 Cov(Z(xi), Z(xj)) = C(h), h = Z(xi)− Z(xj), xi, xj ∈ D (2) That means that mean and covariance function m(h) and C(h) are assumed to be translation invariant. A more general assumption, called intrinsic stationarity, only demands that the variance of the increments of Z(.) has to be invariant with respect to translation: Var(Z(xi)− Z(xj)) = 2γ(h), h = xi − xj , xi, xj ∈ D (3) Equation (3) introduces also the semivariogram γ(h). It is connected with the covariance function via γ(h) = C(0)−C(h) if C(h) exists. In simple cases γ(.) depends only on | h |= h. This leads to isotropic variograms and covariance functions. Both will later be used to determine the system matrix in the prediction step. Therefore it is necessary to estimate the semivariogram. The usual estimator in the isotropic case γ(h) = γ(h) with h = |h|) is γ̂(h) = 1 2N(h) ∑ xi−xj∈L(h) (Z(xi)− Z(xj)) (4) where L(h) is an interval (lag) [hl, hu] containing h and N(h) denotes the number of pairs (xi, xj) falling into the lag L(h). Semivariogram functions have the property of negative semi-definiteness. This makes it necessary to fit a valid semivariogram function to the estimated γ̂(h). Often used semivariograms models are e.g. the spherical or the exponential model. Most semivariogram models are parametrized by the so called nugget parameter (it describes discontinuities at the origin), a range parameter (equals the correlation radius) and the sill parameter (maximum semivariogram value taken outside of the range). The estimator used by kriging for prediction at a location x0 ∈ D has the linear form Ẑ(x0) = λ >Z (5) with the kriging weights λ = (λi)i∈I and the data vector Z = (Z(xi))i∈I . Depending on the modeling assumptions regarding m(x) two variants of kriging can be differentiated: Ordinary kriging which uses a constant trend model m(x) = const and universal kriging which models m(x) with a parameter-linear setup m(x) = θ>f(x) with a parameter vector θ ∈ R and a set of regression functions fj(x), j = 1, . . . , p. The kriging weights λ are now chosen by minimizing the prediction variance σK(x0) = Var(Ẑ(x0)) under the condition of unbiasedness E(Ẑ(x0)) = Z(x0)), which is also called universality condition in this context and evaluates to Σi∈Iλi = 1 for ordinary kriging and F>λ = f 0 for universal kriging using the design Matrix F = (f(xi)) > i∈I and f0 = f(x0). Minimizing σK(x0) finally leads to the kriging equation ( C F F> 0 )( λ θ ) = ( c0 f 0 ) (6) DSC 2003 Working Papers 3 with the covariance matrixC = (C(xi−xj))i,j∈I and the vector c0 = (C(x0−xi))i∈I . Ordinary kriging can be viewn as special case of universal kriging with p = 1, f 1 (x) = 1 and F = 1 = (1)i∈I . Because usually correlation vanishes if the distance of two points raises it is only necessary to take points within a certain distance around a prediction point into account for building the kriging equation system. This area around prediction points is called search neighborhood and its radius corresponds to the range parameter of the semivariogram. More details can be found in Cressie [1991]. Kriging prediction for the location x0 consists now of three steps: 1. Estimation of the semivariogram γ 2. semivariogram model fitting 3. Solving equation (6) to determine the kriging weights λ and calculation of Ẑ(x0). Usually prediction is carried out not only for one point x0. The final output of kriging are prediction maps, that means kriging is performed for each point of a discrete prediction grid. If high quality prediction maps era disered dense prediction grids have to be used. This clearly enlarges the computational burden. Therefore it can be helpful to consider alternative approaches like parallel programming in this context. 2 Parallelizing spatial prediction An algorithm can only be successfully reprogrammed in a parallelized manner if it contains blocks which can be executed independently of each other. Now it is necessary to examine the three steps of kriging prediction mentioned above for their potential to be executed in parallel. It will be assumed that this prediction has to be performed at the points of a discrete grid. The first step, semivariogram estimation, has only to be executed once per data set and prediction grid. This holds also for the second step, semivariogram fitting. This step includes usually much interaction by the analyst, e.g. choosing the appropriate semivariogram model, which can not be done in an automated manner. But the last step consists of sequential repeated solving of kriging equation system for each grid point. In this case also the condition of independence of the computations for different grid points is fulfilled. So clearly this steps qualifies as a candidate for parallel execution and it will be covered in the next sections. A simple parallel version of kriging would consist of distributing the task of predicting at the points of the prediction grid to the members of the cluster. A supervising process has to manage the distribution of these tasks, to collect the results from the cluster members and to feed the cluster members with new grid points until prediction for all points is done. In an initialization step at the start of prediction all cluster members have to get the whole data set and semivariogram DSC 2003 Working Papers 4 parameters from the supervising process. This process should also be responsible for some error checking. Another idea would be to divide the dataset into smaller parts and to feed these parts to the cluster members for prediction at one or more grid locations. This could help in situations where very huge datasets occur, which e.g. can not be handled at once because of resource limitations. The following sections will discussed the first idea in detail. supervizing process cluster clients Figure 1: Schematic overview of parallel prediction on a grid 3 Improving parallel kriging computation It can be helpful to search for properties, which allow a more effective way of parallelization. If we consider the kriging prediction grid and the above mentioned parallel version of the usually sequentially performed prediction on a grid, it turns out that the prediction at grid points, which are located very close to another, will yield very similar results. This is caused by the fact that both points share much of its neighbors and their distances to these neighbors will be very similar. In other words their search neighborhoods will be almost identical. Now this brings up the idea to handle both points x1 and x2 at once.
منابع مشابه
Dsc 2003 Working Papers
This paper describes the integration of interactive statistical graphics within R using state-of-the art R-connectivity. The iPlots project uses SJava to deliver interactive plots for R and an associated framework for their customization. The graphical part of iPlots is implemented in Java. The design is discussed and some practical examples of iPlots are illustrated.
متن کاملNew Concepts of Nano-crystalline Organic Photovoltaic Devices
The working principle of a nano-crystalline dye sensitized solar cell (nc-DSC) of the Grätzel type depends on a working cycle consisting of dye excitation, electron injection into titanium dioxide and fast reduction of the oxidized dye by a redox couple. Continuous stability testing at Solaronix on glass/glass thermoplastic sealed nc-DSC devices, revealed that no significant degradation of the ...
متن کاملContractile activity of human decidual stromal cells.
We previously demonstrated that human decidual stromal cells (DSC), the main cellular component of the decidua, are similar in antigen phenotype and structure to myofibroblasts, cells with contractile activity. In this work we isolated and maintained DSC in fibroblast medium, in which these cells show a stable phenotype similar to that of DSC in vivo. Flow cytometric observations showed that mo...
متن کاملNumerical solution of the Helmholtz equation with high wavenumbers
This paper investigates the pollution effect, and explores the feasibility of a local spectral method, the discrete singular convolution (DSC) algorithm for solving the Helmholtz equation with high wavenumbers. Fourier analysis is employed to study the dispersive error of the DSC algorithm. Our analysis of dispersive errors indicates that the DSC algorithm yields a dispersion vanishing scheme. ...
متن کاملA two-stage unsupervised learning algorithm reproduces multisensory enhancement in a neural network model of the corticotectal system.
Multisensory enhancement (MSE) is the augmentation of the response to sensory stimulation of one modality by stimulation of a different modality. It has been described for multisensory neurons in the deep superior colliculus (DSC) of mammals, which function to detect, and direct orienting movements toward, the sources of stimulation (targets). MSE would seem to improve the ability of DSC neuron...
متن کاملMeasurement of Heat Capacity by Differential Scanning Calorimetry
The methods of measuring the heat capacity of materials by differential scanning calorimetry (DSC) are presented and discussed. The principles of temperature modulated DSC are outlined and the possibilities and limits of the method are discussed.
متن کامل